A Machine Learning Model for Information Retrieval with Structured Documents
نویسندگان
چکیده
Most recent document standards rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. Only a few models have been proposed for handling structured documents, and the design of such systems is still an open problem. We present here a new model for structured document retrieval which allows to compute and to combine the scores of document parts. It is based on bayesian networks and allows for learning the model parameters in the presence of incomplete data. We present an application of this model for ad-hoc retrieval and evaluate its performances on a small structured collection. The model can also be extended to cope with other tasks such as interactive navigation in structured documents or corpus.
منابع مشابه
Self-paced Compensatory Deep Boltzmann Machine for Semi-Structured Document Embedding
In the last decade, there has been a huge amount of documents with different types of rich metadata information, which belongs to the Semi-Structured Documents (SSDs), appearing in many real applications. It is an interesting research work to model this type of text data following the way how humans understand text with informative metadata. In the paper, we introduce a Self-paced Compensatory ...
متن کاملارائه الگوریتمی مبتنی بر یادگیری جمعی به منظور یادگیری رتبهبندی در بازیابی اطلاعات
Learning to rank refers to machine learning techniques for training a model in a ranking task. Learning to rank has been shown to be useful in many applications of information retrieval, natural language processing, and data mining. Learning to rank can be described by two systems: a learning system and a ranking system. The learning system takes training data as input and constructs a ranking ...
متن کاملA Structured Information Extraction Algorithm for Scientific Papers based on Feature Rules Learning
Traditional scientific papers are unstructured documents, which are difficult to meet the requirement of structured retrieval, statistical classification and association analysis and other high-level application. Hence, how to extract and analyze the structured information of the papers becomes a challenging problem. A structured information extraction algorithm is proposed for unstructured and...
متن کاملHandling Texts ? A Challenge for Data Mining
The amount of data in free form by far surpasses the structured records in databases in their number. However, standard learning algorithms require observations in the form of vectors given a fixed set of attributes. For texts, there is no such fixed set of attributes. The bag of words representation yields vectors with as many components as there are words in a language. Hence, the classificat...
متن کاملDomain Knowledge Acquisition for Information Retrieval using Neural Networks
This paper presents the results of some experiments investigating the use of Neural Networks in the learning engine of an Connectionist Information Retrieval system called CIRS. CIRS uses the learning and generalisation capabilities of the Back Propagation learning algorithm to acquire and use application domain knowledge in the form of a sub-symbolic knowledge representation. This paper descri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003